Machine learning prediction model based on enhanced bat algorithm and support vector machine for slow employment prediction

The employment of college students is an important issue that affects national development and social stability. In recent years, the increase in the number of graduates, the pressure of employment, and the epidemic have made the phenomenon of ’slow employment’ increasingly prominent, becoming an urgent problem to be solved. Data mining and machine learning methods are used to analyze and predict the employment prospects for graduates and provide effective employment guidance and services for universities, governments, and graduates. It is a feasible solution to alleviate the problem of ’slow employment’ of graduates. Therefore, this study proposed a feature selection prediction model (bGEBA-SVM) based on an improved bat algorithm and support vector machine by extracting 1694 college graduates from 2022 classes in Zhejiang Province. To improve the search efficiency and accuracy of the optimal feature subset, this paper proposed an enhanced bat algorithm based on the Gaussian distribution-based and elimination strategies for optimizing the feature set. The training data were input to the support vector machine for prediction. The proposed method is experimented by comparing it with peers, well-known machine learning models on the IEEE CEC2017 benchmark functions, public datasets, and graduate employment prediction dataset. The experimental results show that bGEBA-SVM can obtain higher prediction Accuracy, which can reach 93.86%. In addition, further education, student leader experience, family situation, career planning, and employment structure are more relevant characteristics that affect employment outcomes. In summary, bGEBA-SVM can be regarded as an employment prediction model with strong performance and high interpretability.


Introduction
Employment is the most basic livelihood, and the employment of college students is an important issue related to people's livelihood and the stable development of society.In recent years, China's economic development has entered a new normalized stage, and the number of graduates as well as the increase in employment pressure.It has led to more college graduates who do not intend to be employed immediately after completing their studies, and thus actively or passively become a 'slow employment' group.The phenomenon of 'slow employment' has become a prominent problem in the employment of college graduates, in addition to the impact brought about by the epidemic, the phenomenon of 'slow employment' of college students has become more prominent.It brings a greater negative impact on China's economic development, social stability, and talent training in many aspects.
Understanding the basic situation of the 'slow employment' group of college students, analyzing the causes of the 'slow employment' phenomenon, and digging out the key influencing factors, provides a set of scientific and effective solutions to prevent the 'slow employment' behavior and crack the 'slow employment' phenomenon.This is of great significance to promote higher quality and fuller employment of graduates.The employment landscape for college students and graduates has become more challenging due to the COVID-19 pandemic [1].Shi [2] pointed out that the record-breaking number of graduates and the problem of career decision-making for graduates due to the economic recession have become obstacles to the active employment of college students.Researchers and college educators are increasingly concerned about graduate employment and the potential problems that may exist in the future career choices of current students.Li and Zhang [3] constructed a decision tree model based on big data to provide effective help for college students to solve the problem of slow employment in terms of scientific decision-making, precise guidance, accurate service, and time-sensitive assistance.Wang and Li [4] verified the mechanism of the influence of employment value on the willingness to choose slow employment by means of a survey document for students of several universities in Haidian and Changping districts of Beijing.The study found that long-term post-employment income and employment costs have a facilitating effect on positive 'slow employment' choices, employment anxiety only plays a mediating role, and short-term income plays a negative role for 'slow employment'.This study contributes to a deeper understanding of college students' career choices and provides a reference for full employment.
At present, data mining technology has been applied in the fields of academic early warning [5], career planning [6], and teaching assessment [7].The database of student employment information also contains a large number of valuable laws.With the development of computer technology, some machine learning methods [8] or hybrid models [9,10] have gradually come into the public's view and become an important method to solve the prediction problem.Of course, these techniques have also received attention in the field of education.Rahman et al [11] used data mining techniques for feature selection and predicted graduate employment using techniques such as K-Nearest Neighbor, Naive Bayes, Decision Trees, Neural Networks, Logistic Regression and Support Vector Machines and then analyzed the data based on Rapid Miner.Chen et al [12] used the road factor score approach to systematically analyze and assess the comprehensive employability of graduates to scientifically guide students in their search for suitable careers.Bharambe et al [13] used data mining techniques to assess the employability of students by using a classification model to intelligently predict which types of companies' needs the skills acquired by students are suitable for.Zhao et al [14] proposed a random forest algorithm to select features for the employee retention rate of 'double-class' university graduates, and based on the principal component analysis, it was found that the economic levels factors such as regional gross domestic product, wages, the average sales price of commercial properties, and the unemployment rate were the main factors affecting employment mobility in each province and city, and then a back propagation neural network was used to predict and obtain high accuracy and stability.Zhao et al [15] proposed a model for predicting the employment situation of college graduates based on long-and short-term memory recurrent neural networks.The model can effectively reflect the complex characteristics of university graduate employment data and the nonlinear dynamic interaction of influencing factors, and the data that mainly affect the employment situation were selected for prediction.It is compared with the traditional statistical method based on the cluster analysis technique, and the results show that the technique has higher prediction accuracy and reliability.Tu et al [16] proposed a model for predicting the entrepreneurial intentions of graduate students based on a chaotic local search sine cosine algorithm, random forest, and support vector machine, and demonstrated the importance of components such as major, gender, general student type, grade point average and total credits in influencing the choice of entrepreneurial intentions.Gao et al [17] proposed an intelligent prediction model of employment stability with a multigroup slime mould algorithm combined with support vector machines, which showed better prediction results and demonstrated the association of current employment monthly salary, first employment monthly salary, change of employment location, degree of first employment major affiliation, and salary difference on students' employment stability.
To predict and assess the 'slow employment' phenomenon of college students, this work presented and used bGEBA-SVM, a wrapper feature selection approach based on the improved bat algorithm (GEBA) and support vector machine.First, to enhance the optimization capability of the feature subset search method, Gaussian distribution-based strategy and Elimination Strategy were introduced into the bat algorithm to enhance the global optimization capability of the algorithm and to improve the population quality.
The remaining structure of the paper is set up as follows.Section 2 presents the graduate employment prediction dataset and the proposed bGEBA-SVM prediction method.Section 3 implements a benchmark function experiment based on IEEE CEC2017 to validate GEBA's optimization capacity.Section 4 validates the prediction ability of bGEBA-SVM with the public dataset and the graduate employment prediction dataset.Section 5 discusses the suggested approach and the experimental results in further detail.Section 6 summarizes this study and proposes goals for future research based on the existing foundation.

Graduate employment prediction dataset
In this study, 1,694 graduates of the class of 2022 from Zhejiang universities were selected for the study, and predictions were made based on 18 characteristics.The details of the Graduate Employment Prediction (GEP) dataset are shown in Table 1.
Since the above-mentioned data did not involve ethical issues, the review committee/ethics committee of Wenzhou Vocational College of Science and Technology granted an exemption from ethical review.

Bat algorithm
The bat algorithm mainly simulates the behavior of bats to find tiny insects foraging through an echolocation system.In the bat algorithm, each bat (search agent) flies at a random velocity v i at the location x i (solution of the problem) while the bats have different wavelengths, loudness A i , and pulse emissivity r.When a bat finds prey, its frequency, loudness, and pulse emissivity change for the best solution selection.The specific, more detailed procedure of the bat algorithm is as follows.
First, each bat generates ultrasonic frequencies f i according to random, as shown in Eq (1).
where β is a 0 to 1 random vector, and f min and f max are the minimum and maximum values of f i which are set to 0 and 2, respectively.Then, each bat updates its velocity v t i according to its current velocity v tÀ 1 i and the distance between its current position x tÀ 1 i and the optimal position x best further updates the bat's current position x t i according to v t i .Eqs (2) and (3) are used to calculate v t i and x t i , respectively.
When the global optimal solution is updated, each local solution x old in the current population is updated, as shown in Eq (4).
where ε denotes a random number obeying a uniform distribution between -1 and 1, and A t denotes the average loudness of all bats.After the position is updated, the bat makes a greedy choice between the current position x old and the updated position x new .When the bat position is updated, the loudness A and pulse frequency r are updated, as shown in Eqs ( 5) and (6).
where α and γ are constants set to 0.9, and r and A are set to 0.5.The pseudo-code of the bat algorithm is shown in Algorithm 1.

Gaussian distribution-based strategy
To obtain better optimization results, Gaussian distribution is used to optimize the way the original BA is updated, enhancing the ability of the algorithm to search for the global optimal solution in the search space.The Gaussian distribution-based strategy is calculated from Eq (7).
where Gaussian(μ i , σ i ) denotes the Gaussian kernel function, which obeys normal distribution.μ i and σ i denote the mean and standard deviation, respectively, as shown in Eqs ( 8) and (9).

Elimination strategy
In this study, the elimination strategy is introduced into BA to improve the optimization capability of the algorithm.Replacing the poorer search agents in the population with the derived search agents based on the optimal solution x best and the suboptimal solution x sub can improve the population quality, prevent the search agents from overexploiting near the poor solution, and thus improve the algorithm accuracy.The mathematical model of the elimination strategy is as follows.
First, the individual information of the optimal solution x best and the suboptimal solution x sub is used to generate a new reference search agent x_ref, as shown in Eq (10).
Then, the worst 5% of individuals in the population were updated according to Eqs (11) and (12).
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where k and Ra 2 are random numbers between 0 and 1 that obey a uniform distribution.

Implementation of GEBA
To improve the optimization performance of the original BA, the Gaussian distribution-based strategy and elimination strategy were introduced in this study.Wherein, the Gaussian distribution-based strategy is used to improve the convergence accuracy of the algorithm, so Eq (4) in the original BA was replaced by Eqs (7) to (9).In the early stages of optimization, a more diverse population can speed up the algorithm's convergence and enhance the likelihood that the algorithm will find the best solution.However, when the optimization is late a larger population diversity may cause the algorithm to waste more resources on computing inferior solutions, which is not conducive to obtaining high-quality solutions.Thus, the pseudo-code of GEBA shows that after all search agents in BA were updated by the elimination strategy, the inferior solutions in the population were replaced with new search agents in the optimal solution region, which promotes a balance between global exploration and local exploitation of the algorithm.The pseudo-code for GEBA is shown in Algorithm 2.

Proposed prediction model bGEBA-SVM
The set of optimized features may be thought of as a discrete optimization problem for feature selection, where a value of '1' indicates that the feature has been chosen and a value of '0' indicates that it has not.As a result, GEBA is further modified by discretization to create bGEBA, a variant that may be used to optimize the discrete space.The S-shaped function was selected as the GEBA transformation function in this study.
The S3 function in the S-shaped is applied to the j-th component of the i-th search agent, x i, j .If the random number between 0 and 1 is smaller than the output value s, the component information xb i,j of this search agent in the discrete space is output as 1, otherwise it is output as 0.
The j-th component of the i-th search agent x i,j is input into the S3 function in the S-shaped, and if the random number between 0 and 1 is less than the output value s then the output of the component information xb i,j of this search agent in the discrete space is 1, and vice versa is 0. Eqs ( 13) and ( 14) show the calculation.
( Support vector machine (SVM) [38] has excellent generalization performance as well as good performance for nonlinear and nonconvex problems, so SVM was chosen as the classifier for the GEP dataset in this study.Furthermore, a wrapper feature selection method based on the combination of bGEBA and SVM is proposed, which is called bGEBA-SVM.The prediction process of bGEBA-SVM for the GEP dataset is shown in Fig 1 (The code is available at https://github.com/Forproject1111/bGEBA-SVM).
This approach makes use of bGEBA's exceptional ability to search for the best option to optimize the feature subset.The feature subset is fed into the SVM as input for training, and the SVM's prediction output serves as the evaluation criterion (fitness function) for this collection of feature subsets.The fitness function fitness is generated using Eq (15).The process is repeatedly iterated until the feature subset that gives the classifier the best performance is attained.
where, ω 1 and ω 2 make two weight parameters that are used to measure the impact of the model prediction accuracy and the number of selected features on the feature subset evaluation.
Since the prediction accuracy of the model is the focus of the study, ω 1 and ω 2 are set to 0.99 and 0.01, respectively.
where ω 1 and ω 2 are two weight factors that are used to assess how well the model predicts the future and how many features were chosen for the feature subset analysis.Since the study's main focus is the model's prediction accuracy, ω 1 and ω 2 are set to 0.99 and 0.01, respectively.

Experiments on benchmark functions
In the prediction method based on the wrapper feature selection, swarm intelligence optimization algorithms are used as a key part of them to optimize the subset of features trained by the input model.Therefore, the optimization capability of the swarm intelligence optimization algorithm is one of the important factors affecting the prediction results.To explore the optimization performance of GEBA, this section sets up the benchmark function experiments for validation.

Experiment setup
The algorithm's search way can be divided into global exploration and local exploitation.The global exploration capability indicates the algorithm's ability to search for optimal solutions in unknown regions of the search space, increasing the probability of the algorithm avoiding local extremes.However, the probability of obtaining a poor solution is likewise elevated, which is not conducive to improving the accuracy of the current optimal solution.The local exploitation capability indicates the ability of the algorithm to further exploit near the current solution and improve the quality of the optimal solution.But this may make the algorithm fall into a local optimum dilemma.When the global exploration ability and local exploitation ability of the algorithm are balanced, the optimization ability of the algorithm can be fully utilized, and thus better optimization results can be obtained.To more comprehensively verify the optimization performance of the algorithm, this section verifies the optimization capability of the algorithm based on the IEEE CEC2017 benchmark function [36], and the details of IEEE CEC2017 are shown in Table 2.In addition, to ensure the fairness of the experimental results, the public parameters of the benchmark function experiments are set uniformly, and the public parameters as well as the experimental environment are shown in Table 3.

Ablation experiment
To verify the significance of the Gaussian distribution-based strategy and elimination strategy for the performance improvement of the algorithm, this subsection set up the ablation experiments of the optimization strategies.The two optimization strategies were introduced into BA separately, and the details of the ablation experiment comparison method are shown in Table 4.The comparison was carried out by ranking in both WSRT and FT nonparametric tests as well as convergence tests.
Table 5 shows the comparison results and rankings of the four methods, from the table it can be seen that the average rankings of GEBA for WSRT and FT are 1.37 and 1.88 respectively with the best performance, followed by GBA.The results of the two test rankings of EBA and BA are different, EBA performs better in the WSRT ranking, while the introduction of the elimination strategy in the FT ranking instead reduces the optimization performance of the algorithm.However, the comparison between GEBA and GBA shows that the performance improvement of the GBA algorithm is more obvious by the elimination strategy.Fig 2 shows the convergence of the four methods, and it is clear from the figure that the convergence curve of GEBA is at the bottom of all methods, and GEBA converges faster except for the F18 function.
In summary, the experiment in this subsection demonstrates that the combination of two optimization strategies outperforms the optimization effect of individual strategies and has more significant performance improvement for algorithm optimization.

Search history analysis
To explore the search process of GEBA for optimal solutions to different optimization problems, this subsection was experimentally verified by performing 1-dimensional search history, 2-dimensional function top view, and average fitness value error.In addition to illustrating the gradual convergence of GEBA to the optimal solution, the average fitness value convergence curve also indicates that the differences among the search agents are smaller and more consistent with the optimization process in the later iterations.In summary, in the optimization process, GEBA gradually converges to the optimal position based on the information provided by all search agents, the global exploration of the

Stability experiment
To explore the stability of the algorithm in handling high-dimensional optimization problems, this subsection tested the optimization performance of GEBA and BA at 50 and 100 dimensions.
Table 6 shows the comparative results and rankings of the algorithms tested at high dimensions.GEBA outperforms BA in both dimensions with more than 19 functions, and BA performs better in only one function.There is no significant difference between the two algorithms in other functions, and the optimization performance of GEBA and BA is approximately equal.The ranking of the two statistical nonparametric tests, WSRT and FT, shows that GEBA has better optimization capability and robustness in handling different optimization problems.In addition, the convergence speed and convergence accuracy of GEBA can be seen in Figs 4 and 5, which show that GEBA performs well in high dimensions.

Comparative experiment on GEBA with advanced peers
To objectively evaluate the algorithm optimization performance, GEBA was compared with 10 advanced similar methods, and the parameters of the 10 comparison methods were set as shown in Table 7.
First, the performance of the 11 methods was verified by evaluating the mean and standard deviation of 30 independently run experiments on each function, and the experimental results are shown in Table 8.Observing the data in the table, it can be seen that the best performing algorithm among the 30 benchmark functions is GEBA.Among them, GEBA has 16 optimal means and 7 minimum standard deviations.Although GEBA is less stable compared to FA, the optimization performance of GEBA is still stable among all compared methods, and GEBA has a better overall optimization performance.Second, the results of the comparison between WSRT and FT statistics in Table 9 also show that GEBA ranks higher compared to the other methods.In the comparison results of P-values between GEBA and other methods, the number of P + is at least 19 and the number of P -is at most 6.It indicates that GEBA performs better and that these results are statistically significant.For the WSRT ranking, GEBA had the best average ranking of 2.50, followed by FA.For the FT ranking, GEBA had the best average ranking of 3.20, followed by RCBA.Third, the convergence curve images of the 11 algorithms are shown in Fig 6 .Observing the images it is easy to see that at about 20,000 to 50,000 evaluations, the curve of GEBA reaches convergence and the solution quality is higher.

Experiment setup
This section set up a series of experiments to verify the predictive performance of bGE-BA-SVM.bGEBA-SVM was compared with eight similar methods and four popular prediction models, including bBA-SVM [22], bSCA-SVM [24], bGWO-SVM [50], bHHO-SVM [51], bSMA-SVM [21], Back Propagation Neuron NetWok (BP) [52], Classification And Regression Tree (CART) [53], Random Forest (RandomF) [54] and Adaptive Boosting (AdaBoost) [55] for comparison.To ensure the fairness of the experiment, this study unified the experimental setup.The population size (N) was set to 20, the dimension was determined by the dataset, and the number of loop terminations (Max_Iter) was set to 100.In addition, in order to avoid the occurrence of randomized experimental results, this study set up 10 independently run experiments and discussed the results of the mean and standard deviation.
In addition, to obtain a more accurate assessment of the model performance, this study comprehensively evaluated the model prediction ability by analyzing the four-assessment   metrics of Accuracy, Sensitivity, Matthews correlation coefficient (MCC), and F-measure of the prediction results, the details of which are shown in Table 10.

Transformation function experiment
When optimization methods are discretized, the use of different transformation functions can cause differences in prediction results [56].Therefore, to improve the prediction performance of the model for the GEP dataset, this subsection conducted experimental tests for the eight most common transformation functions [57] for S-shaped and V-shaped, and the experimental setup is shown in Table 11.
Fig 7 shows the four evaluation metrics for the prediction of the GEP dataset by eight discretization methods.As can be seen from the figure, S3-bGEBA-SVM performs the best among the eight comparisons, with Accuracy, Sensitivity, Matthews correlation coefficient, and F-measure reaching 93.86%, 88.65%, 0.8816 and 93.36%, respectively.This suggests that in order to optimize the feature space and eventually produce better prediction results for GEBA, the S3 transformation function is more appropriate.Therefore, this study set the S3 function as the default discretization method and conducted subsequent experiments.

Comparative experiment on the public datasets
To verify the generalization ability of the proposed method, bGEBA-SVM was compared with other methods by six public datasets [58,59] for comparative experiments, and Table 12 shows the details of these datasets.
Table 13 to Table 18 shows the means and standard deviations of the four evaluation metrics for the prediction results of the public dataset for the 10 prediction methods.Observing the data of the table species, we can see that bGEBA-SVM has the best mean and standard ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ðTPþFPÞ�ðTPþFNÞ�ðTNþFPÞ�ðTNþFNÞ p F-measure FÀ measure ¼ TP deviation of various metrics on all datasets except for the standard deviation of the four evaluation metrics on the Heart dataset.It indicates that bGEBA-SVM has better prediction ability for public datasets and its prediction performance is stable, which is a method with excellent generalization ability.

Comparative experiment on the employment prediction dataset
In this subsection, to achieve a more accurate prediction of graduate employment, bGE-BA-SVM was used to predict the GEP dataset, and its superior prediction performance was verified by comparing it with other methods.In the wrapper feature selection method, bGEBA was used as an optimization method to optimize a subset of features, and the set of features that favor the prediction accuracy of the model was fed into the classifier.Among them, the optimization performance of bGEBA is a key factor affecting the prediction accuracy of the model.Therefore, the convergence curves of bGEBA as well as peers for evaluating the objective function of feature subsets were analyzed as shown in Fig 8 .Observing the images, it can be seen that bGEBA can jump out of the local optimum compared with other methods, and thus obtain a higher-quality feature subset.Combining the optimization results of bGEBA for feature subsets and the above prediction results,

Important features analysis
The proposed bGEBA-SVM method has strong interpretability, and to verify the role played by each feature in the prediction process, feature importance experiments were set up in this study.
Fig 10 shows the number of times each feature was selected in the 100 times of feature selection experiments.The experimental results are realistic in that the features A11, A4, A12, A10, A2, A13, A3, and A1 are selected more times and have a more significant impact on predicting the employment aspect of graduates.

Discussion
The results of the experiment show that there are significant correlations among the factors influencing college students' choice of 'slow employment', such as preparation for school, student leadership experience, family situation, career planning, employment structure, family parenting style, student category, gender, professional interest.Among them, career planning, academic achievement, self-concept, and family situation are negatively correlated with the intention of 'slow employment', i.e. the clearer the career planning, the better the academic achievement, the clearer the self-concept and the better the family economic situation, the lower the intention of 'slow employment' of college students [60,61].There are three other aspects that we should consider.First, among the 'slow employment' intention, the number of students who are actively 'slow employment' is increasing because they are pursuing further education, and more of them have the experience of being student leaders during their college years.Among those who are actively 'slow employment', there is one point worth noting: there are more science and technology majors than economics and management majors, while agriculture majors have a lower intention to be actively 'slow employment' [61][62][63].Second, family upbringing also influences students' intention of 'slow employment', the more democratic family upbringing, the stronger students' intention of 'slow employment'.Excessive family interference or lack of family involvement can make career decision-making difficult, leading young people to make poor choices based on these interferences [2].Relatively democratic families, on the other hand, allow young people to be less influenced by external pressures and more diversified in their career choices, but this is more likely to lead to missed opportunities.Family influence is certainly a factor worth noting in the case of slow employment.Third, students' intention to 'slow employment' is influenced by the type of birth source to a certain extent, and students from rural areas have a higher intention to 'slow employment' than those from urban areas.It can also be concluded that the situation of 'slow employment' of college students should be analyzed scientifically and treated rationally.This paper established a scientific 'slow employment' prediction and evaluation model, used the improved bat algorithm as a feature subset search method, and used the screened key features for the training of the classification model, the proposed evaluation model can provide reasonable auxiliary decision-making suggestions for universities to solve the problem of 'slow employment' of college students.First, it can help colleges and universities to do a good job of classifying and guiding students' employment, predicting employment intentions, grasping the employment needs and difficulties of 'slow employment' students in time, and providing employment guidance and services for different types of students.Second, it can help colleges and universities to make precise policies, find out the bottom number, send jobs, target help, and implement personalized employment services.Third, it can help all parties to help promote student employment.Colleges and universities can work with families, society, and other parties to help graduates understand the employment situation, strengthen their ability to improve, master the pace of job hunting, better intervene in the 'slow employment' behavior, and actively guide their positive transformation.We can assess, predict and prevent 'slow employment', to promote higher quality and fuller employment of graduates.

Conclusion and future works
The wrapper feature selection approach (bGEBA-SVM) presented in this study was developed on an improved bat algorithm and a support vector machine.By incorporating an Elimination approach and a Gaussian distribution-based approach into the Bat algorithm, an effective and reliable optimization technique (GEBA) was presented.In order to ensure that better feature subsets can be obtained, this study introduced the Gaussian distribution-based strategy and the Elimination strategy based on the original bat algorithm.The experimental findings at IEEE CEC2017 shown that bGEBA offers considerable optimization performance benefits over advanced comparable algorithms.bGEBA was objectively evaluated with certain sophisticated approaches.To suggest a discrete version for feature selection, the GEBA is further discretized (bGEBA).bGEBA provides effective training data for support vector machines and achieves more accurate prediction of graduate employment prediction dataset.Through experimental tests and analysis, we found that school, student leadership experience, family situation, career planning, employment structure, family parenting style, student category, gender, professional interest and other factors have a greater influence on graduates' positive and negative responses to the choice of "slow employment".Analyzing these factors facilitates a more accurate prediction of graduate employment problems and the development of effective measures.Although the proposed model has a more stable and excellent performance for the graduate "slow employment" prediction problem.However, its performance is still limited by the optimization efficiency and classification model performance.In future research, we will plan to incorporate high-performance computing techniques such as distributed optimization to solve the feature subset optimization efficiency problem, and use machine learning models with stronger predictive performance and compatibility.Additionally, the suggested GEBA's optimization capabilities may be utilized to address picture segmentation [64] and engineering optimization challenges [65].

Fig 3
shows the historical search trajectory of GEBA on different optimization problems.Fig 3(A) shows the corresponding 3D image of the function.Fig 3(B) shows that the GEBA searches for larger steps in the search space in the early iterations, and in the late iterations as the global exploration ability decreases, the local search ability increases making the GEBA gradually converge to the optimal value in the dimension.From Fig3(C), it can be seen that on the cross-section of the search space, the black dots indicate the historical locations of all search agents and the red dot indicates the location of the optimal solution.It can be seen that most of the search agents' locations are clustered, and only a few of them are scattered in various regions of the search space.Of course, by looking at F1, F4, and F28 it is easy to see that in addition to a part of the historical positions clustering around the optimal solution, there is also a part of clustering around the suboptimal or local optimal solution.But in the end, the advantage of GEBA is to get rid of the local optimum and thus obtain a higher quality solution.
https://doi.org/10.1371/journal.pone.0294114.g003population gradually becomes local exploitation, and GEBA has a strong ability to jump out of the local optimal solution.

value fitness i of x t i If (rand<A i )&&(fitness i <fitness best
) Calculate the new position x new by Eq (4) Evaluate the fitness value fitness new of x new and greedy selection

Algorithm 2. Pseudocode for GEBA. Initialize the
algorithm parameters: f min , f max , α, γ, r, A, Initialize the population Evaluate the fitness value of each search agent If (rand<A i )&&(fitness i <fitness best ) Update μ i by Eq (8) Update σ i by Eq (9) Calculate the new position x new by Eq (7) Evaluate the fitness value fitness new of x new and greedy selection FEs = FEs+1 Update A t by Eq (5) Update r t by Eq (6) Update x ref according to x best and x sub by Eq (10) Update x t (11) Eqs(11)to (12) Evaluate the fitness value fitness i of x t i and greedy selection End for FEs = FEs+2×N t = t+1 End while loop Return the best solution

Table 6 . Comparative results of scalability experiment.
Note: N/A indicates null value, bold indicates the optimal result.Combined with the above analysis, GEBA's optimization ability and convergence are significantly improved when dealing with high-dimensional problems.https://doi.org/10.1371/journal.pone.0294114.t006

Table 17 . Prediction evaluation results for public dataset Heart.
that bGEBA has a stronger global optimization ability and further improves the prediction accuracy of the model by feeding high-quality training data to the model.Table19and Fig9show the prediction results for the graduate employment dataset.It is easy to see that bGEBA-SVM can achieve a more accurate prediction of graduate employment due to peers and popular prediction models on the four performance evaluation metrics.